Phone recognition in critical bands using sub-band temporal modulations
نویسندگان
چکیده
This study investigates a multistream phone recognition system, which consists of 21 parrallel sub-systems, each covers two critical bands, and fused by a multi-layer perceptron (MLP) system. Within each band, speech information is encoded by the frequency-domain linear prediction (FDLP) feature, which characterizes the temporal modulation of subband envelope. Two experiments are conducted to determine the optimal parameters for speech features, the maximum temporal modulation Fm and the context window length T , followed by an experiment to evaluate the robustness of the fused system in noise. Results show that the phone accuracies of subsystems reach the maximum point at about 500–600ms; they keep increasing monotonically as the maximum frequency of temporal modulation changes from 4 to 40 Hz, where it saturates. Tests of the fused system in babble and subway noise at 15 dB SNR indicate that the multi-stream system is more robust to noise than the single-steam baseline system.
منابع مشابه
Subband-based Speech Recognition
In the framework of Hidden Markov Models (HMM) or hybrid HMM/Artiicial Neural Network (ANN) systems, we present a new approach t o wards automatic speech recognition (ASR). The general idea is to divide up the full frequency band (represent e d i n t e r m s o f critical bands) into several subbands, compute phone probabilities for each sub-band on the basis of subband acoustic features, perfor...
متن کاملLP-TRAPs in all senses
This report describes additional experiments with LP-TRAPs – speech features derived from autoregressive model applied to approximate temporal evolution of speech spectra in critical band-sized frequency sub-bands. The importance of free parameters, such as order model, length of the approximated temporal pattern, compression factor, or number of resulting cepstral coefficients, is investigated...
متن کاملBeyond a single critical-band i
TRAP based ASR attempts to extract information from rather long (as long as 1 s) and narrow (one critical-band) patches (temporal patterns) from time-frequency plane. We investigate the effect of combining temporal patterns of logarithmic critical-band energies from several adjacent bands. The frequency context is gradually increased from one critical-band to several criticalbands by using temp...
متن کاملSubband-based speech recognition
In the framework of Hidden Markov Models (HMM) or hybrid HMM/Articial Neural Network (ANN) systems, we present a new approach t o w ards automatic speech recognition (ASR). The general idea is to divide up the full frequency band (represented in terms of critical bands) into several subbands, compute phone probabilities for each sub-band on the basis of subband acoustic features, perform dynami...
متن کاملLearning discriminative temporal patterns in speech: development of novel TRAPS-like classifiers
Motivated by the temporal processing properties of human hearing, researchers have explored various methods to incorporate temporal and contextual information in ASR systems. One such approach, TempoRAl PatternS (TRAPS), takes temporal processing to the extreme and analyzes the energy pattern over long periods of time (500 ms to 1000 ms) within separate critical bands of speech. In this paper w...
متن کامل